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We present a method developed by the NNPDF Collaboration that allows the inclusion of 
new experimental data into an existing set of parton distribution functions without the need 
for a complete refit. A Monte Carlo ensemble of PDFs may be updated by assigning each 
member of the ensemble a unique weight determined by Bayesian inference. The reweighted 
ensemble therefore represents the probability density of PDFs conditional on both the old 
and new data. This method is applied to the inclusion of W-lepton asymmetry data into the 
NNPDF2.1 fit producing a new PDF set, NNPDF2.2. 

NNPDF Approach 

The NNPDF methodology differs from other parton fitting approaches in two main aspects, 
the choice of parameterization and the treatment of uncertainties. Neural networks are used 
to parametrize the PDFs at the initial scale. These are chosen because they provide a robust, 
unbiased interpolation and permit a very flexible parameterization. An NNPDF fit has almost 
three hundred free parameters and is therefore free from the bias associated with the choice of 
an inflexible functional form. The flexibility of the neural network fit has been demonstrated 
in several ways. For example, the fit has been shown to be stable under changes in network 
architecture^ and under the addition of new PDFs^. 

In order to obtain an accurate representation of PDF uncertainty, NNPDF fits are performed on 
pseudo data samples generated from the original experimental data by Monte Carlo sampling. 
Each sample, or 'replica' forms the basis for a separate fit, resulting in an ensemble of PDF 
replicas that faithfully represent the uncertainty in the experimental data, without the need for 
a tolerance criterion. 



These features of the NNPDF approach allow for an interesting exploitation of Bayesian in- 
ference to determine the impact of new datasets. 



Reweighting Method 

We shall discuss how an existing probability distribution in the space of PDFs may be updated 
with information from new data. To include the new data, one can of course perform a fit with 
the new, enlarged dataset. However this is a time consuming task, particularly for observables 
where no fast code is available. It is therefore desirable to have a faster method of including new 
data in order to assess it's impact rapidly without the need for a full refit. NNPDF parton sets 



are supplied as an ensemble of iV parton distribution replicas £, representing the probability 
density in PDFs V(f) based upon the data in the existing fit. We can therefore include new 
data by weighting each replica fk in the ensemble by an associated weight Wk- If the replica 
weights are computed correctly, then reweighting is completely equivalent to a refit. 

In order to illustrate the reweighting method, consider the computation of the expected value of 
a PDF-dependant observable 0[f\. We note that as the NNPDF Monte Carlo ensemble is a good 
representation of the probability density V(f), the expectation value {0[f]) can be calculated 
as a simple average, 

N 
k=l 

New data can be included into the existing ensemble by assigning each replica a unique weight 
w. This weight assesses the agreement between the replica and the new data. The reweighted 
ensemble now forms a representation of the probability distribution of PDFs V new (f) conditional 
on both the existing and new data. The mean value of the observable O taking account of the 
new data is then given by the weighted average 
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where the weights are given in terms of the individual replica x 2 to new data by 

( 2x(n-l)/2 -fx! 
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Note that after reweighting a given ensemble of N PDF replicas we no longer have the same 
efficiency in describing the distribution of PDFs. The reweighting procedure will often assign 
replicas very small weights, therefore these replicas no longer contribute to the ensemble. The 
efficiency of the representation of the underlying distribution V new (f) will therefore be less than 
it would be in a new fit. The loss of information due to reweighting can be quantified using the 
Shannon entropy to determine the effective number of replicas in the reweighted set: 

N 

N cS = exp{i w k ln(N/w k )}. 
fc=l 




Unweighting 



Once we have a reweighted PDF set, we would like to be able to produce a new PDF ensemble 
with the same probability distribution as a reweighted set, but without the need to include 
the weight information. A method of unweighting has therefore been developed, whereby the 
new set is constructed by deterministically sampling with replacement the weighted probability 
distribution. This means that replicas with a very small weight will no longer appear in the 
final unweighted set while replicas with large weight will occur repeatedly. 
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Figure 2: Graphical representation of the construction of a set of N' rep unweighted replicas from a set of 
N rep = 20 weighted ones. Each segment is in one-to-one correspondence to a replica, and its length is 
proportional to the weight of the replica. The cases of N' 3> N rep (top) and N' rep = 10 (bottom) are 

shown. 

If we define the probability for each replica, and the probability cumulants as 



Pk = j- T — Pk = Pk-i + Pk = ^2pj ■ 

ISJ rep j =Q 

we can quantitatively describe the unweighting procedure. Starting with N rep replicas with 
weights Wk, we determine N rep new weights w' k : 



These weights are therefore either zero or a positive integer. By construction they satisfy: 
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i.e the new unweighted set consists of iV r ' ep replicas, simply constructed by taking w' k copies of 



the fc-th replica, for all k = 1, ...,N rep . This procedure is illustrated graphically in figure [2j 



Testing Reweighting and Unweighting 

To verify the effectiveness of the reweighting procedure, we will show that including datasets 
by reweighting produces an ensemble of PDF replicas statistically equivalent to a full refit. We 
begin by producing a new NNPDF2.0 fit, including only DIS and Drell-Yan data. The data 
left out of the fit (Tevatron Run II inclusive jet data) is then reintroduced by reweighting. The 
resulting reweighted ensemble is then compared to the full NNPDF2.0 fit. 

In Figure Q] we see the gluon PDF for the three sets; the prior fit NNPDF2.0(DIS+DY), the 
reweighted set NNPDF2.0(DIS+DY) with jet data included, and the refitted full set NNPDF2.0. 
The figure shows excellent agreement between the reweighted set and the full fit. Differences 
stay well below statistical fluctuations. 



To obtain a more precise estimation of the statistical equivalence of the refitted and reweighted 
parton sets, and also to test the unweighting procedure, we may examine the statistical distances 
between the new unweighted distributions and the refitted set. The distance formulae are de- 
fined in Appendix A of Ref. If two sets give a description of the same underlying probability 
distribution and so are statistically equivalent, the distance between them will fluctuate around 
a value of one. At d ~ 7 the discrepancy between the two sets is at the one sigma level. In the 
case of the Tevatron jets reweighting exercise, we see in Figure [3] that these distances oscillate 
around one. The reweighted set is therefore equivalent to the refit and there is no significant 
loss of accuracy in the unweighting procedure. 
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Figure 3: Distance between central values (left) and uncertainties (right) of the NNPDF2.0 PDFs and the 
NNPDF2.0 DIS+DY PDFs reweighted with Tevatron jet data and then unweighted. 

Having developed the unweighting procedure, we can perform another check on the consistency 
of the reweighting method. When adding more than one set of data by reweighting, our method 
must satisfy combination and commutation properties. Reweighting with both sets must be 
equivalent to reweighting with one, unweighting then reweighting with the other. Of course 
switching the order in which we reweight must produce an equivalent distribution. 
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Figure 4: Multiple reweighting demonstration. Plots of gluon PDF(left) and valence PDF (right). 

To check that our procedure satisfies these properties, we perform a test using the Tevatron jet 
data as the first dataset and E605 fixed target Drell-Yan data as the second. In Figure 2] we 
compare the inclusion of the combined set with the inclusion of one set after the other. The 
result is clearly independent from the order in which the inclusion of single datasets is performed. 
A distance analysis performed on the three produced sets confirms that the reweighting method 
satisfies the combination and commutation requirements. 
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Figure 5: Comparison of light quark and antiquark distributions at the scale Q 2 = M^y from the global NNPDF2.1 
and NNPDF2.2 global fits. Parton densities are plotted normalized to the NNPDF2.1 central value. 



NNPDF2.2 



The Bayesian reweighting method has been used to construct a new NNPDF parton set: NNPDF2. 
In this set we take as a prior ensemble the NNPDF2.1 fit and include by reweighting the W- 
lepton charge asymmetry measurements of the ATLAS, CMS and DO collaborations. 

The NNPDF2.1 set provides a reasonable description of the new measurements, with Xtot/Ndat 
2.22. After reweighting with the new data this improves to an excellent level of agreement with 
Xtot/Ndat = 0.81. Having reweighted a prior set with iV re p = 1000 initial replicas, 181 remain, 
indicating that the data provides a substantial constraint. Using the unweighting procedure 
outlined above, we have produced the new PDF set with iV re p = 100. 

Figure [5] demonstrates the impact of the new data on the light quark and antiquark PDFs. 
The uncertainties are significantly constrained by the data in two main regions, there is a re- 
duction of around 20% at x ~ 10" 3 and 30% in the region x ~ 10 2 to x ~ 10 1 . The overall 
fit quality improves slightly, from a total Xtot/Ndat °f 1-165 with NNPDF2.1 to 1.157. The con- 
straints demonstrated here are the first such constraints upon parton distributions from LHC 
data. 

Such constraints are particularly important given the discrepancies between global parton dis- 
tribution fits in flavour separation at medium to large x. The W-lepton charge asymmetry data 
included here may prove useful in resolving some of these discrepancies. Future LHC data will 
no doubt provide further constraints upon PDFs in this region. 
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